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CAUSAL INFERENCE FOR CONTINUOUS-TIME PROCESSES 
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Most of the work on the structural nested model and g-estimation 
for causal inference in longitudinal data assumes a discrete-time un- 
derlying data generating process. However, in some observational 
studies, it is more reasonable to assume that the data are generated 
from a continuous-time process and are only observable at discrete 
time points. When these circumstances arise, the sequential random- 
ization assumption in the observed discrete-time data, which is essen- 
tial in justifying discrete-time g-estimation, may not be reasonable. 
Under a deterministic model, we discuss other useful assumptions 
that guarantee the consistency of discrete-time g-estimation. In more 
general cases, when those assumptions are violated, we propose a 
controlling-the-future method that performs at least as well as g- 
estimation in most scenarios and which provides consistent estima- 
tion in some cases where g-estimation is severely inconsistent. We 
apply the methods discussed in this paper to simulated data, as well 
as to a data set collected following a massive flood in Bangladesh, 
estimating the effect of diarrhea on children's height. Results from 
different methods are compared in both simulation and the real ap- 
plication. 

1. Introduction and motivation. In this paper, we study assumptions 
and methods for making causal inferences about the effect of a treatment 
that varies in continuous time when its time-dependent confounders are 
observed only at discrete times. Examples of settings in which this prob- 
lem arises are given in Section 1.2. In such settings, standard discrete-time 
methods such as g-estimation usually do not work, except when certain 
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conditions are assumed for the continuous-time process. In this paper, we 
formulate such conditions. When these conditions do not hold, we propose a 
controlling-the-future method which can produce consistent estimates when 
g-estimation is consistent and which is still consistent in some cases when 
g-estimation is severely inconsistent. 

First, we review the approach of James Robins and collaborators to mak- 
ing causal inferences about the effect of a treatment that varies at discrete, 
observed times. 

1.1. Review of Robins' causal inference approach for treatments varying 
at discrete, observed times. In a cross-sectional observational study of the 
effect of a treatment on an outcome, a usual assumption for making causal 
inferences is that there are no unmeasured confounders, that is, that condi- 
tional on the measured confounders, the data is generated as if the treatment 
were assigned randomly. Under this assumption, a consistent estimate of the 
average causal effect of the treatment can be obtained from a correct model 
of the association between the treatment and the outcome conditional on the 
measured confounders [Cochran (1965)]. In a longitudinal study, the analog 
of the "no unmeasured confounders" assumption is that at the time of each 
treatment assignment, there are no unmeasured confounders; this is called 
the sequential randomization or sequential ignorability assumption, given as 
follows. 

(Al) The longitudinal data of interest are generated as if the treatment 
is randomized in each period, conditional on the current values of measured 
covariates and the history of the measured covariates and the treatment. 

The sequential randomization assumption implies that decision on treatment 
assignment is based on observable history and contemporaneous covariates, 
and that people have no ability to see into the future. Robins (1986) has 
shown that for a longitudinal study, unlike for a cross-sectional study, even 
if the sequential randomization assumption holds, the standard method of 
estimating the causal effect of the treatment by the association between 
the outcome and the treatment history conditional on the confounders can 
provide a biased and inconsistent estimate. This bias can occur when we are 
interested in estimating the joint effects of all treatment assignments and 
when the following conditions hold: 

(cl) conditional on past treatment history, a time-dependent variable is 
a predictor of the subsequent mean of the outcome and also a predictor of 
subsequent treatment; 

(c2) past treatment history is an independent predictor of the time- 
dependent variable. 
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Here, "independent predictor" means that prior treatment predicts current 
levels of the covariate, even after conditioning on other covariates. An exam- 
ple in which the standard methods are biased is the estimation of the causal 
effect of the drug AZT (zidovudine) on CD4 counts in AIDS patients. Past 
CD4 count is a time-dependent confounder for the effect of AZT on future 
CD4 count since it not only predicts future CD4 count, but also subsequent 
initiation of AZT therapy. Also, AZT history is an independent predictor of 
subsequent CD4 count [e.g., Hernan, Brumback and Robins (2002)]. 

To eliminate the bias of standard methods for estimating the causal effect 
of treatment in longitudinal studies where sequential randomization holds 
but there are time-dependent confounders satisfying conditions (cl) and (c2) 
(e.g., past CD4 counts), Robins (1986, 1992, 1994, 1998, 2000) developed a 
number of innovative methods. We focus here on structural nested models 
(SNMs) and their associated methods of g-testing and g-estimation. The ba- 
sic idea of the g-test is the following. Given a hypothesized treatment effect 
and a deterministic model of the treatment effect, we can calculate the po- 
tential outcome that a subject would have had if she never received the treat- 
ment. Such an outcome is also known as a counter factual outcome, which is 
the outcome under a treatment history that might be contrary to the real- 
ized treatment history. If the hypothesized treatment effect is the true treat- 
ment effect, then this potential outcome will be independent of the actual 
treatment the subject received conditional on the confounder and treatment 
history, under the sequential randomization assumption (Al). g-estimation 
involves finding the treatment effect that makes the g-test statistic have its 
expected null value. For simplicity, our exposition focuses on determinis- 
tic rank-preserving structural nested distribution models; g-estimation also 
works for nondeterministic structural nested distribution models. 

The SNM and g-estimation were developed for settings in which treatment 
decisions are being made at discrete times at which all the confounders are 
observed. In some settings, the treatment is varying in continuous time, but 
confounders are only observed at discrete times. 

1.2. Examples of treatments varying in continuous time where covariates 
are observed only at discrete times. 

Example 1 (The effect of diarrhea on children's height). Diarrheal dis- 
ease is one of the leading causes of childhood illness in developing regions 
[Kosek, Bern and Guerrant (2003)]. Consequently, there is considerable con- 
cern about the effects of diarrhea on a child's physical and cognitive de- 
velopment [Moore et al. (2001), Guerrant et al. (2002)]. A data set which 
provides the opportunity to study the impact of diarrhea on a child's height 
is a longitudinal household survey conducted in Bangladesh in 1998-1999 



4 



M. ZHANG, M. M. JOFFE AND D. S. SMALL 



after Bangladesh was struck by its worst flood in over a century in the sum- 
mer of 1998 [del Ninno et al. (2001), del Ninno and Lundberg (2005)]. The 
survey was fielded in three waves from a sample of 757 households: round 
1 in November, 1998; round 2 in March-April, 1999; round 3 in Novem- 
ber, 1999. The survey recorded all episodes of diarrhea for each child in the 
household in the past six months or since the last interview by asking the 
families at the time of each interview. In addition, the survey recorded at 
each of the three interview times several important time-dependent covari- 
ates for the effect of diarrhea on a child's future height: the child's current 
height and weight; the amount of flooding in the child's home and village; 
the household's economic and sanitation status. In particular, the child's 
current height and weight are time-dependent confounders that satisfy con- 
ditions (cl) and (c2), making standard longitudinal data analysis methods 
biased [see Martorell and Ho (1984) and Moore et al. (2001) for discussion 
of evidence for and reasons why current height and weight satisfy conditions 
(cl) and (c2)]. The time-dependent confounders of current height and weight 
are available only at the time of the interview, and changes in their value 
that might affect the exposure of the child to the "treatment" of diarrhea, 
which varies in continuous time, are not recorded in continuous time. 

Example 2 [The effect of AZT (Zidovudine) on CD4 counts]. The Mul- 
ticenter AIDS Cohort Study [MACS, Kaslow et al. (1987)] has been used 
to study the effect of AZT on CD4 counts [Hernan, Brumback and Robins 
(2002), Brumback et al. (2004)]. Participants in the study are asked to come 
semi-annually for visits at which they are asked to complete a detailed in- 
terview, including a complete history of their AZT use, as well as to take a 
physical examination. Decisions on AZT use are made by subjects and their 
physicians, and switches of treatment might happen at any time between 
two visits. These decisions are based on the values of diagnostic variables, 
possibly including CD4 and CD8 counts, and the presence of certain symp- 
toms. However, these covariates are only measured by MACS at the time 
of visits; the values of these covariates at the exact times that treatment 
decisions are made between visits are not available. 

1.3. A model data generating process. In both the examples of AZT and 
diarrhea, the exposure or treatment process happens continuously in time 
and a complete record of the process is available, but the time-dependent 
confounders are only observed at discrete times. There could be various 
interpretations of the relationship between the data at the treatment decision 
level and the data at the observational time level. To clarify the problem 
of interest in this paper, we consider a model data generating process that 
satisfies all of the following assumptions: 
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(al) a patient takes a certain medicine under the advice of a doctor; 

(a2) a doctor continuously monitors and records a list of health indicators 
of her patient and decides the initiation and cessation of the medicine solely 
based on current and historical records of these conditions, the historical 
use of the medicine and possibly random factors unrelated to the patient's 
health; 

(a3) a third party organization asks a collection of patients from vari- 
ous doctors to visit the organization's office semi-annually; the organization 
measures the same list of health indicators for the patients during their vis- 
its and asks the patients to report the detailed history of the use of the 
medicine between two visits; 

(a4) we are only provided with the third party's data. 

Note that in (a2), we assume the sequential randomization assumption (Al) 
at the treatment decision level. 

The AZT example can be approximated by the above data generating 
process. In the AZT example, (al) and (a2) approximately describe the joint 
decision- making process by the patient and the doctor in the real world. 
(a3) can be justified by reasonably assuming that the staff at the MACS 
receive similar medical training and use similar medical equipment as the 
patients' doctors. In the diarrhea example, the patient's body, rather than 
a doctor, determines whether the patient gets diarrhea. Assumption (a3), 
then, is saying that the third party organization (the survey organization) 
collects enough health data and that if all the histories of such health data 
are available, the occurrence of diarrhea is conditionally independent of the 
potential height. 

1.4. Difficulties posed by treatments varying in continuous time when co- 
variates are observed only at discrete times. Suppose our data are gener- 
ated as in the previous section and we apply discrete-time g-estimation at the 
discrete times at which the time-dependent covariates are observed; we will 
denote these observation times by 0, . . . , K . In discrete-time g-estimation, 
we are testing whether the observed treatment at time t (t = 0, . . . , K) is, 
conditional on the observed treatments at times 0, 1, . . . , t — 1 and observed 
covariates at times 0, . . . ,t, independent of the putative potential outcomes 
at times t + 1, . . . , K, calculated under the hypothesized treatment effect, 
where the putative potential outcomes considered are what the subject's 
outcome would be at times t + l,...,K if the subject never received treat- 
ment at any time point. The difficulty with this procedure is that even if 
sequential randomization holds when the measured confounders are mea- 
sured in continuous time [as is assumed in (a2)], it may not hold when the 
measured confounders are measured only at discrete times. For the discrete- 
time data, there can be unmeasured confounders. In the MACS example, 



G 



M. ZHANG, M. M. JOFFE AND D. S. SMALL 



the diagnostic measures at the time of AZT initiation are missing unless the 
start of AZT initiation occurred exactly at one of the discrete times that 
the covariates are observed; the diagnostic measures at the initiation time 
are clearly important confounders for the treatment status at the subsequent 
observational time. In the diarrhea example, the nutrition status of the child 
before the start of a diarrhea episode is missing unless the start of the di- 
arrhea episode occurred exactly at one of the discrete times that covariates 
are observed; this nutrition status is also an important confounder for the 
diarrhea status at the subsequent observational time. Continuous-time se- 
quential randomization does not, in general, justify sequential randomization 
holding for the discrete-time data, meaning that discrete-time g-estimation 
can produce inconsistent estimates, even when continuous-time sequential 
randomization holds. 

In this paper, we approach this problem from two perspectives. First, we 
give conditions on the underlying continuous-time processes under which 
discrete-time sequential randomization is implied, warranting the use of 
discrete-time g-estimation. Second, we propose a new estimation method, 
called the controlling-the-future method, that can produce consistent esti- 
mates whenever discrete-time g-estimation is consistent and can produce 
consistent estimates in some cases where discrete-time g-estimation is in- 
consistent. 

Our discussion focuses on a binary treatment and repeated continuous 
outcomes. We also assume that the cumulative amount of treatment between 
two visits is observed. This is true for Examples 1 and 2, the AZT and 
diarrhea studies, respectively. If cumulative treatment is not observed, there 
will often be a measurement error problem in the amount of treatment, 
which is beyond the scope of this paper and an issue which we are currently 
researching. 

The organization of the paper is as follows: Section 2 reviews the standard 
discrete-time structural nested model and g-estimation, describes a modified 
application when the underlying process is in continuous time and proposes 
conditions on the continuous-time processes when it works; Section 3 de- 
scribes our controlling-the-future method; Section 4 presents a simulation 
study; Section 5 provides an application to the diarrhea study discussed in 
Example 1; Section 6 concludes the paper. 

2. A modified g-estimation for discretely observed continuous-time pro- 
cesses. In this section, we first review the discrete-time structural nested 
model and the standard g-estimation, and mathematically formalize the 
setting we described in Section 1.3. Then, with a slight modification and 
different interpretation of notation, the g-estimation can be applied to the 
discrete-time observations from the continuous-time model. We will show 
that under certain conditions, this estimation method is consistent. 
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2.1. Review of discrete-time structural nested model and g-estimation. 
To reduce notation for the continuous-time setting, we use a star super- 
script on every variable in this section. 

Assuming that all variables can only change values at time 0,1,2, ... ,K, 
we use A£ to denote the binary treatment decision at time k. Under the 
discrete-time setup, A* k is assumed to be the constant level of treatment 
between time k and time (k + 1). We use Y®* to denote the baseline potential 
outcome of the study at time k if the subject does not receive any treatment 
throughout the study and Y£ to denote the actual outcome at time k. In 
this paper, we assume that all Y fc °*'s and Y,*'s are continuous variables. Let 
LT be the vector of covariates collected at time k. As a convention, Y,* is 
included in L\. 

We consider a simple deterministic model for the purposes of illustration, 

fc-i 

(1) Y£ = Y°* + ^A*, 

i=0 

where is the causal parameter of interest and can be interpreted as the 
effect of one unit of the treatment on the outcome. 

Model (1) is known as a rank-preserving model [Robins (1992)]. Under this 
model, for subjects i and j who have the same observed treatment history 
up to time k, if we observe Y k ^ < Y k j, then we must have Y k °* < Y k °*. It is 
also stronger than a more general rank-preserving model since Y k * depends 
deterministically only on Y k * and the ^4*'s. 

Causal inference aims to estimate \£ from the observables, the AVs and 
LVs. One way to achieve the identification of ^> is to assume sequential 
randomization (Al). Given this notation and model (1), a mathematical 
formulation of (Al) is 

(2) P{A%\Ll,A%_ 1 ,Yl\) = P{Al\Ll,Al_ 1 ), 

where Z* = (L , L h . . . , L k ), = (A ,^i, . . . , A fc _i) and Y°+ = C^+i, 

For any hypothesized value of ^f, we define a putative potential outcome, 

k-l 

Y k *(V) = YZ-*J2 A i- 
Then, under (1) and (2), the correct \E' should solve 

(3) E[U(*)]=eI J] [Al k -p k {Xl k )]g{Y^),Xl k )\ = 0, 

^k<m<K ' 
l<i<N 
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where i is the index for each subject where there are N subjects, X* k = 

(L* t k>-A%,k-i)>Pk( x lk) = P ( A i,k = l \ X i,k) is the propensity score for subject i 
at time k and g is any function. This estimating equation can be generalized, 
with g being a function of any number of future Y^{^)'s and X* k . 
To estimate ^, we solve the empirical version of (3): 

(4) U(*) = [At k -p k (Xt k )]g(Y^),X* k ) = 0. 

k<m<K 
l<i<N 

If the true propensity score model is unknown and is parameterized as 
Pk(Xt, j3), additional estimating equations are needed to identify (3. For 
example, the following estimating equations could be used: 

(5) U(%0)= J2 i^,k-Pk(X* k )][g(YP^),X* k ),Xl k ] T = 0. 

k<m<K 
l<i<N 

The method is known as g- estimation. The efficiency of the estimate de- 
pends on the functional form of g. The optimal g function that produces 
the most efficient estimation can be derived [Robins (1992)]. The formulas 
for estimating the covariance matrix of are given in Appendix A. A 

short discussion of the existence of the solution to the estimating equation 
and identification can be found in Appendix B. 

2.2. A continuous-time deterministic model and continuous-time sequen- 
tial randomization. We now extend the model in Section 2.1 to a continuous- 
time model and define a continuous-time version of the sequential random- 
ization assumption (Al) as a counterpart of (2). 

We now assume that the variables can change their values at any real time 
between and K. The model in Section 2.1 is then extended as follows: 

• {Yt'i < t < K} is the continuous-time, continuously- valued outcome 
process; 

• {Lt',0 < t < K} is the continuous-time covariate process — it can be 
multidimensional and Yf is an element of Lt, 

• {At, < t < K} is the continuous-time binary treatment process; 

• {^t°;0 < t < K} is the continuous-time, continuously- valued poten- 
tial outcome process if the subject does not receive any treatment from time 
to time K — it can be thought of as the natural process of the subject, free 
of treatment/intervention. 

As a regularity condition, we further assume that all of the continuous- 
time stochastic processes are cadlag processes (i.e., continuous from the 
right, having limits from the left) throughout this paper. 
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A natural extension of model (1) is 

(6) Y t = Y t ° + V I A s ds, 

Jo 

where \& is the causal parameter of interest, can be interpreted as the 
effect rate of the treatment on the outcome. 

In this continuous-time model, a continuous-time version of the sequential 
randomization assumption (Al) or, equivalently, assumption (a2), can be 
formalized, although it does not have a simple form similar to equation (2). 
It was noted by Lok (2008) that a direct extension of the formula (2) involves 
"conditioning null events on null events." 

Lok (2008) formally defined continuous-time sequential randomization 
when there is only one outcome at the end of the study. We propose a similar 
definition for studies with repeated outcomes under the deterministic model 
(6). 

Let Z t = (L t , A t ,Y®). Let cr{Z t ) be the cr-field generated by Z t , that is, 
the smallest cr-field that makes Zt measurable. Let o~(Zt) be the cr-field 
generated by [J u<t a(Z u ). Similarly, o~(Z t ,Y^ + ) is the cr-field generated by 
cr(Z t )Ua(Y ( l + ), where a(Y^ + ) is the cr-field generated by \J u>t a(Y°). By 
definition, the sequence of a(Zt), < t < K, forms a filtration. The se- 
quence of a(Z t ,Y^ + ), < t < K, also forms a filtration because o~(Z t ,Y_t + ) C 
a(Z s ,Y^ + ) for t < s [note that this is true under the deterministic model 
(6), but not in general]. 

Let Nt be a counting process determined by At- It counts the number of 
jumps in the At process. Let At be a version of the intensity process of Nt 
with respect to o~(Zt). Mt = N — Jq X s ds will be a martingale with respect 
to a(Z t ). 

Definition 1. With Nt and M t defined as above, the cadlag process 
Zt = (Lt, At,Yt), < t < K, is said to satisfy the continuous-time sequential 
randomization assumption, or CTSR, if Mt is also a martingale with respect 
to a(Zt,Y^ + ). Or, equivalently, there exists a At that is the intensity of Nt, 
with respect to both the filtration of o~(Zt,Y^ + ) and the filtration of a(Zt). 

In this definition, given Aq, the counting process {Nt}]) offers an alter- 
native description of the treatment process {A t }u . The intensity process At, 
which models the jumping rate of Nt, plays the same role as the propensity 
scores in the discrete-time model, which models the switching of the treat- 
ment process. Definition 1 formalizes assumption (Al) in the continuous- 
time model, by stating that At does not depend on future potential out- 
comes. 

The definition can be generalized if At has more than two levels, where 
Nt can be a multivariate counting process, each element counts a type of 
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jump of the At process and At is the multivariate intensity process for Nt 
under both the filtration of o~(Zt) and the filtration of a(Zt,Y^ + ); see Lok 
(2008). 

2.3. A modified g- estimation. In this paper, we assume that the contin- 
uous process defined in Section 2.2 can only be observed at integer times, 
namely, times 0, 1, 2, . . . , K. We use the same starred notation as in Section 
2.1, but interpret instances of this as discrete-time observations from the 
model in Section 2.2. Specifically: 

• {A* k , k = 0, 1, 2, . . . , K} denotes the set of treatment assignments ob- 
servable at times 0, 1,2, . . . ,K. We use A* k to denote the observed history 
of observed discrete-time treatment up to time k, that is, (Aq, A*, 
A* k ). Additionally, we use cumA^ = J Q A s ds to denote the cumulative 
amount of treatment up to time k. Note that in the continuous-time model, 
cumA* k ^ Ylk~=o^k'^ as ^ would in discrete-time models. We let cum^4? = 
(cumij, cum A%, ■ ■ ■ , cumA* k ). We note that, in practice, people sometimes 
use A* k = cumA k+1 — cum Af. as the treatment at time k when applying 
discrete-time g-estimation to discrete-time observational data. Under deter- 
ministic models, such use of g-estimation usually requires stronger condi- 
tions than the conditions discussed in this paper. Throughout this paper, 
we define the treatment at time k as A* k . 

• We define Lt, the observed covariates at time k, to be L/ c _, the left 
limit of L at time k, following the convention that in the discrete model, 
people usually assume that the covariates are measured just before the treat- 
ment decision at time k. Y k and Y k * are also defined as and Y" fc °_, re- 
spectively, following the same convention. L* k denotes (Lq, U[, . . . ,L k ), and 
Y* and Y" fc °* are defined accordingly. Y° k * + = {Y^Y^ ...,Y°*). 

With this notation and in the spirit of g-estimation, which controls all 
observed history in the propensity score model for the treatment, we propose 
the following working estimating equation: 

(7) U(*)= [Al k - Pk (Xl k )} g (Y^),Xl k )=0, 

k<m<K 
l<i<N 

where X* k is the collection of 1*^,1*^ and cum^, Pk(X* k ) = P{A* k = 
and Y^) = Y* m -$cumi* fc . 
In practice, Pk{X* k ) is unknown and has to be parameterized as pk(X* k ; 

j3), and we use different functions g to identify all of the parameters, as 
in Section 2.1. The covariance matrix of estimated parameters can be es- 
timated as in Appendix A. A discussion of the existence of a solution and 
identification can be found in Appendix B. 
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The estimating equation has the same form as (4), except for two im- 
portant differences. First, the propensity score model in this section con- 
ditions on the additional cum A* , . In the discrete-time model of Section 
2.1, cumA* k would be a transformed version of A* k _ x and was redundant 
information. However, with continuous-time underlying processes, cwmA* k 
provides new information on the treatment history. Second, the putative po- 
tential outcome Y^(^f) is calculated by subtracting the cumA* k from Y* m , 
instead of ^2^0 We will later refer to the g-estimation in this section as 
the modified g-estimation (although it is in the true spirit of g-estimation). 
The justification and limitation of using the modified g-estimation will be 
discussed in Section 2.4. 

We refer to the g-estimation in Section 2.1 as naive g-estimation when 
it is applied to data from a continuous-time model. When the data come 
from a continuous-time model, the naive g-estimation can be severely biased, 
as we will show in our simulation study and the diarrhea application. One 
source of bias is a measurement error problem, ^Y2a=q * s n °t the correct 
measure of the treatment; another source of bias is that the important in- 
formation cunii4j^ is not conditioned on in the propensity score. Although 
we would not expect researchers to use naive g-estimation when the true 
cumulative treatments are available, we present the simulation and real ap- 
plication results using this method as a reference to show how severely biased 
the estimates would be had we not known the true cumulative treatments 
and the measurement error problem had dominated. 

2.4. Justification of the modified g-estimation. Given discrete-time ob- 
servational data from continuous-time underlying processes, solving equa- 
tion (7) provides an estimate for vl/. For this ^ estimate to be consistent, an 
analog to condition (2) is needed: 

(8) p(^|^,^_ 1) ^u^,yf + ) = PK|^ ) ^_ 1 ,c^I^*). 

Condition (8) is a requirement on variables at observational time points. 
Its validity for a given study relies on how the data are collected, in ad- 
dition to the underlying continuous-time data generating process. It is not 
clear, without conditions on the underlying continuous-time data generat- 
ing process, how one would go about collecting data in a way such that (8) 
would hold while the standard ignorability (2) is not true. Here, we will seek 
conditions at the continuous-time process level that imply condition (8) and 
hence justify the estimating equation (7). In particular, we consider two such 
conditions. 
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2.4.1. Sequential randomization at any finite subset of time points. Re- 
call the data generating process described in Section 1.3. The third party 
organization periodically (e.g., semi-annually) collects the health data and 
treatment records of the patients. Suppose that a researcher thinks (8) holds 
for the time points at which the third party organization collects these data. 
If the time points have not been chosen in a special way to make (8) hold, 
then the researcher will often be willing to make the stronger assumption 
that (8) would hold for any finite subset of time points at which the third 
party organization chose to collect data. For example, for the diarrhea study, 
the survey was actually conducted in November, 1998, March-April, 1999 
and November, 1999. If a researcher thought (8) held for these three time 
points, then she might be willing to assume that (8) should also hold if the 
survey was instead conducted in December, 1998, February, 1999, May, 1999 
and October, 1999. 

Before formalizing the researcher's assumption on any finite subset of time 
points, we make the following observation. 

Proposition 2. Under the deterministic model assumption (6), the 
propensity score has the following property: 

(9) P(A% = llLtAt^^A*) = P{A% = llllAl^Yt). 

Proof. Under the deterministic assumption (6) and the correct ^f, 
cumi^) is a one-to-one transformation of (L^.,t4^_ 1 , Y®*). □ 

Using Proposition 2, we state the sequential randomization assumption 
at any finite subset of time points as follows. 

Definition 3. A cadlag process Z t = (L t , At, Y®), < t < K, is said to 
satisfy the finite-time sequential randomization assumption, or FTSR, if, for 
any finite subset of time points, < t\ < £2 < • • • < t n < t n+ \ < • • • < t n+ i < 
K, we have 

(10) p(A tB |Z tB _,^ 1 ,i«_,i^ + ) = p(A tn |Z tn _,i tn _ 1 ,i^_) J 

where L t „_ = (L tl _,L* 2 _, . . . ,L tn -), A tn _ x = (A tl ,A t2 ,...,A tn _ 1 ), Y t °_ = 

C*h-)*t2-' • • • '^n~) aIld — 1„+ = (^tn+l-'^+2-'---'-^+I-)- 

It should be noted that for the conditional densities in (9) and (10), and 
the conditional densities in the following sections, we always choose the 
version that is the ratio of joint density to marginal density. 

The finite-time sequential randomization assumption clearly implies con- 
dition (2) and thus justifies the modified g-estimation equation (7). We have 
also proven a result that shows the relationship between the FTSR assump- 
tion and the CTSR assumption. 
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Theorem 4. If a continuous-time cadlag process Zt satisfies finite-time 
sequential randomization, then, under some regularity conditions, it will also 
satisfy continuous-time sequential randomization. 

Proof. See Appendix C. The regularity conditions are also stated in 
Appendix C. □ 

The result of Theorem 4 is natural. As mentioned in Section 1.4, the 
continuous-time sequential randomization does not imply FTSR because, in 
discrete-time observations, we do not have the full continuous-time history 
to control. To compensate for the incomplete data problem, some stronger 
assumption on the continuous-time processes must be made if identification 
is to be achieved. 

2.4.2. A Markovian condition. Given the finite-time sequential random- 
ization assumption described above, two important questions arise. First, 
Theorem 4 shows that the FTSR assumption is stronger than the continuous- 
time sequential randomization assumption. It is natural to ask how much 
stronger it is than the CTSR assumption. Second, the FTSR assumption, 
unlike the CTSR assumption (Al), is not an assumption on the data generat- 
ing process itself and so it is not clear how to incorporate domain knowledge 
about the data generating process to justify it. Is there a condition at the 
data generating process level which will be more helpful in deciding whether 
g-estimation is valid? 

We partially answer both questions in the following theorem. 

Theorem 5. Assuming that the process (Y®, Lt,At) satisfies the continuous- 
time sequential randomization assumption, and that the process (Y t °_ , Lt- , At) 
is Markovian, for any time t and t + s, s > 0, we have 

(11) P(A t \L t _,Y t °_,Y° + ) = P(A t \L t _,Y t °_), 

which implies the finite-time sequential randomization assumption. Here, 
Y2 + = (Y°_,Y t ° 2 _,. ..,*£_) andt<t 1 <t 2 <---< t n . 

Proof. The proof can be found in the Appendix D. □ 

The theory states that the Markov condition and the CTSR assumption 
together imply the FTSR condition. Therefore, they imply condition (2) and 
thus justify the modified g-estimation equation (7). 

We make the following comments on the theorem. 

• The theorem partially answers our first question — the FTSR assump- 
tion is stronger than the CTSR assumption, but the gap between the two 
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assumptions is less than a Markovian assumption. The result is not sur- 
prising since, with missing covariates between observational time points, we 
would hope that the variables at the observational time points well summa- 
rize the missing information. The Markovian assumption guarantees that 
variables at an observational time point summarize all information prior to 
that time point. 

• The theorem also partially answers our second question. The CTSR 
assumption is usually justified by domain knowledge of how treatments are 
decided. Theorem 5 suggests that the researchers could further look for 
biological evidence that the process is Markovian to validate the use of g- 
estimation. The Markovian assumption can also be tested. One could first 
use the modified g-estimation to estimate the causal parameter, construct 
the Y° process at the observational time points and then test whether the 
full observational data of A, L, Y° come from a Markov process. A strict test 
of whether the discretely observed longitudinal data come from a continuous- 
time (usually nonstationary) Markov process could be difficult and is beyond 
the scope of this paper. As a starting point, we suggest Singer's trace in- 
equalities [Singer (1981)] as a criterion to test for the Markovian property. A 
weaker test for the Markovian property is to test conditional independence 
of past observed values and future observed values conditioning on current 
observed values. 

• In the theorem, equation (11) looks like an even stronger version of 
the continuous-time sequential randomization assumption — the treatment 
decision seems to be based only on current covariates and current potential 
outcomes. One could, of course, directly assume this stronger version of ran- 
domization and apply g-estimation. However, Theorem 5 is more useful since 
we are assuming a weaker untestable CTSR assumption and a Markovian 
assumption that is testable in principle. 

• The theorem suggests that it is sufficient to control for current co- 
variates and current potential outcomes for g-estimation to be consistent. In 
practice, we advise controlling for necessary past covariates and treatment 
history. The estimate would still be consistent if the Markovian assumption 
were true and it might reduce bias when the Markovian assumption was not 
true. As a result, we do control for previous covariates and treatments in 
our simulation and application to the diarrhea data. 

• It is worth noting that the labeling of time is arbitrary. In practice, 
researchers can label whatever they have controlled for in their propensity 
score as the "current" covariates, which could include covariates and treat- 
ments that are measured or assigned previously. In this case, the dimension 
of the process that needs to be tested for the Markovian property should 
also be expanded to include older covariates and treatments. 

• Finally, we note that a discrete-time version of the theorem is implied 
by Corollary 4.2 of Robins (1997) if we set, in his notation, U a k to be the 
covariates between two observational time points and to be the null set. 
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(a) DAG of a Markovian process. 




(b) Verification of equation (9). 
Fig. 1. Directed acyclic graph. 



As a discretized example, we illustrate the idea of Theorem 5 by a di- 
rected acyclic graph (DAG) in part (a) of Figure 1, which assumes that all 
variables can only change values at time points 0, 1/2, 1,3/2, 2, . . . , m. Note 
that we do not distinguish the left limits of variables and the variables them- 
selves in all DAGs of this paper, for reasons discussed in Appendix C. We 
also assume that the process can only be observed at times 0, 1,2, ...,m. 
It is easy to verify that the DAG satisfies sequential randomization at the 
0, 1/2, 1,3/2,2, ... ,m time level. The DAG is also Markovian in time. For 
example, if we control A±, L\,Y® , any variable prior to time 1 will be d- 
separated from any other variable after time 1. 

Part (b) of Figure 1 verifies that A\ is d-separated from Y^, m > 1 by 
the shaded variables, namely, L\ and Y®, as is implied by equation (9). By 
Theorem 5, the modified g-estimation works for data observed at the integer 
times if they are generated by the model defined by this DAG. 

It is true that the Markovian condition that justifies the g-estimation 
equation (7) is restrictive, as will be discussed in the following section. 
However, our simulation study shows that g-estimation has some level of 
robustness when the Markovian assumption is not seriously violated. 



3. The controlling-the-future method. In this section, we consider situa- 
tions in which the observational time sequential randomization fails and seek 
methods that are more robust to this failure than the modified g-estimation 



16 



M. ZHANG, M. M. JOFFE AND D. S. SMALL 



given in Section 2.3. The method we are going to introduce was proposed in 
Joffe and Robins (2009), which deals with a more general case of the exis- 
tence of unmeasured confounders. It can be applied to deal with unmeasured 
confounders coming from either a subset of contemporaneous covariates or a 
subset of covariates that represent past time, the latter case being of interest 
for this paper. The method, which we will refer to as the controlling-the- 
future method (the reason for the name will become clear later on), gives 
consistent estimates when g-estimation is consistent and produces consistent 
estimates in some cases even when g-estimation is severely inconsistent. 

In what follows, we will first describe an illustrative application of the 
controlling-the-future method and then discuss its relationship with our 
framework of g-estimation in continuous-time processes with covariates ob- 
served at discrete times. 

3.1. Modified assumption and estimation of parameters. We assume the 
same continuous-time model as in Section 2.2. Following Joffe and Robins 
(2009), we consider a revised sequential randomization assumption on vari- 
ables at the observational time points 

(12) P(Al\LlA%_ 1 ,-ZZ^,T&) = P(Al\Ll,A%_ 1 , 

This assumption relaxes (8). At each time point, conditioning on previous 
observed history, the treatment can depend on future potential outcomes, 
but only on the next period's potential outcome. In Joffe and Robins' ex- 
tended formulation, this can be further relaxed to allow for dependence on 
more than one period of future potential outcomes, as well as other forms 
of dependence on the potential outcomes. 

If the revised assumption (12) is true, then we obtain a similar estimating 
equation as (7). For each putative \P, we map Y k * to 

y fc °*(*)=^-*cum^, 

the potential outcome if the subject never received any treatment under the 
hypothesized treatment effect \E'. 

Define the putative propensity score as 

(13) p k (*) = P{A% = 1\LIAU,^AI,Y^)). 
Under assumption (12), the correct \I/ should solve 

(M) u(*)=eI Yl i A lk-PiAn9(y^(n^k,hk(y))}=u, 

l<i<n ' 
k+l<m<K 

where X tk = ( L *i,ki A ik-v™™Al k ), h i>k (V) = 3$ +1 (tf) and g is any func- 

tion and can be generalized to functions of X* k , hi/.^) and any number 
of future potential outcomes that are later than time k + 1, for example, 
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g(Y^ +2 (^), Y? k+3 ($>), X* k ,h i:k (^)). In most real applications, the model 
for Pk(^) = E[A* k \X k , h k {^)\ is unknown and is usually estimated by a para- 
metric model, 

Pi, k (^Px,Ph)=E[A i>k \X* k ,h i>k (^y,p x ,p h \. 

We can solve the following set of estimating equations to obtain the esti- 
mates of ^, fix and f3h- 

U(%p x ,Ph)= Yl (Ai k -Pi, k (^Px,Ph)) 

l<i<n 
k+l<m<K 

(15) x [g(Y^),X* k ,h iik (V)),X* k ,h iik (n T 

= 0. 

The estimation of the covariance matrix of \£, (3x and fih is similar to the 
usual standard g-estimation, which is described in Appendix A. 

Two important features of estimating equation (15) distinguish it from 
estimating equation (7). First, in (15), there is a common parameter ^> in 
both p k s model and Y^J*(\1/), caused by the fact that the treatment depends 
on a future potential outcome. Second, in (15), the sum over m and k is 
restricted to m > k + 1, while in (7), we only need m> k. If we use m = 
k + lin (15), E{[Al k -p i , k (^)}g(Y^ +1 (^),X* k ,h itk ^))} = usually does 
not lead to the identification of ^f, unless certain functional forms of the 
propensity score model are assumed to be true [see Joffe and Robins (2009)] . 

3.2. The controlling-the- future method and the Markovian condition. Joffe 
and Robins' revised assumption (12) is an assumption on the discrete-time 
observational data. It relaxes the observational time sequential randomiza- 
tion (8) because (8) always implies (12). At the continuous-time data gen- 
erating level, (12) allows less stringent underlying stochastic processes than 
the Markovian process in Theorem 5. 

In particular, we identify two important scenarios where the relaxation 
happens. One scenario is to allow for more direct temporal dependence 
for the Y° process, which we will refer to as the non-Markovian-Y° case. 
The other scenario is to allow colliders in L, which we will refer to as the 
leading-indicator-in-L case. We illustrate both cases by modifying the di- 
rected acyclic graph (DAG) example in Figure 1. 

The non-Markovian-Y° case. Assume, for example, our data is gen- 
erated from the DAG in Figure 2, where we allow the dependence of Y® on 
Y®, even if Kw 2 is controlled. In part (a) of Figure 2, we control for observed 

covariates (Lq,L\), treatment (Aq, ^1/2) and current and historical potential 
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(b) Control for future Y®. 



Fig. 2. Directed acyclic graph with non-Markovian Y t . 

outcome (Yg , Y®) for treatment at time 1 (A\), that is, we have controlled for 
all historically observed covariates, treatment and cumulative treatment as 
suggested in the comments accompanying Theorem 5. In this case, the mod- 
ified g-estimation fails because the paths like A\ <— L1/2 Y^ 2 — > Y^ 2 

Y 2 —>■•••—>■ are not blocked by the shaded variables. In part (b) of Figure 
2, we control for the additional Y 2 . A\ is not completely blocked from Y^, 
but some paths that are not blocked in part (a) are now blocked, for exam- 
ple, the path of A\ ^L l/2 ^- Yy 2 Y 3 ° /2 -)■ Y 2 ° Y 5 ° /2 > Y%. Also, no 

additional paths are opened by conditioning on Y 2 °. We would usually expect 
that the correlation between A\ and Y^ is weakened. Under the framework 
of Joffe and Robins (2009), we can control for more than one period of future 
potential outcomes and expect to further weaken the correlation between A\ 
and Y^ n . A modification of assumption (12) that conditions on more future 
potential outcomes may be approximately true. 

The scenario relates to real- world problems. For instance, in the diarrhea 
example, Y® is the natural height growth of a child without any occur- 
rence of diarrhea. Height in the next month not only depends on the current 
month's height, but also depends on the previous month's height: the com- 
plete historical growth curve of the child provides information on genetics 
and nutritional status, and provides information about future natural height 
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beyond that of current natural height alone. Therefore, the potential height 
process for the child is not Markovian. [For a formal argument why children's 
height growth is not Markovian, see Gasser et al. (1984).] By the reasoning 
employed above, g-estimation fails. However, if we assume that the delayed 
dependence of natural height wanes after a period of time (as in Figure 
2), controlling for the next period potential height in the propensity score 
model might weaken the relationship between current diarrhea exposure and 
future potential height later than the next period and the assumptions of 
the controlling-the-future method might hold approximately. 

The leading-indicator-in-L case. In Figure 1, we do not allow any 
arrows from future Y° to previous L, which means that among all mea- 
sures of the subject, there are no elements in L that contain any leading 
information about future Y°. This means that Y° is a measure that is 
ahead of all other measures, by which we mean that, for example, L2 _L 
1^1 1 Y^ , A2- , L2-m > 2. This is not realistic in many real-world problems. In 
the example of the effect of the diarrhea on height, weight is an important 
covariate. While both height and weight reflect the nutritional status of a 
child, malnutrition usually affects weight more quickly than height, that is, 
the weight contains leading information for the natural height of the child. 
Figure 1 is thus not an appropriate model for studying the effect of diarrhea 
on height. 

In Figure 3, we allow arrows from Y®, 2 to Lq, from Y® to Ly 2 and so 

on, which assumes that L contains leading indicators of Y°, but the leading 
indicators are only ahead of Y° for less than one unit of time. Part (a) 
of Figure 3 shows that controlling for history of covariates, treatment and 
potential outcomes does not block A\ from Y^. On the path of A\ <— Li/ 2 — > 
L\ Y®, 2 — s> Y 2 — > Y®, 2 —>••••—>• Y"^, L\ is a controlled collider. However, in 

part (b), if we do control for Y 2 additionally, the same path will be blocked. 
In general, if we assume that there exist leading indicators in covariates and 
that the leading indicators are not ahead of potential outcomes for more than 
one time unit, g-estimation will fail, but the controlling-the-future method 
will produce consistent estimates. 

The fact that the controlling-the-future method can work in the leading 
information scenario can also be related to the discussion of Section 3.6 of 
Rosenbaum (1984). The main reason for g-estimation's failure in the DAG 
example is that L 1 / 2 is not observable and cannot be controlled. If L±/ 2 
is observed, it is easy to verify that the DAG in Figure 3 satisfies sequen- 
tial randomization on the finest time grid. The idea behind the controlling- 
the-future method is to condition on a "surrogate" for Lin- The surrogate 
should satisfy the property that Y^ is independent of the unobserved L-yi 2 
given the surrogate and other observed covariates [similar to formula 3.17 
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(b) Control for future Y®. 
Fig. 3. Directed acyclic graph with leading indicator in Lt- 

in Rosenbaum (1984)]. In the leading information case, when m > k + 1 and 
we have covariates that are only ahead of the potential outcome until 
time at most k + 1, the future potential outcome Y® +1 is a surrogate. It is 
easy to check that in Figure 3, L 1 / 2 is independent of Y^, given Y" 2 °, L±, Aq 
and cum^4i (equivalently, Yj ). 

It is worth noting that we do not need to control for anything except Y" 2 ° 
in Figure 3 in order to get a consistent estimate. It is possible to construct 
more complicated DAGs in which controlling for additional past and cur- 
rent covariates is necessary, which involves more model specifications for the 
relationships among different covariates and deviates from the main point 
of this paper. 

In Section 4, we will simulate data in cases of non-Markovian-Yj and 
leading-indicator-in-Lj , respectively, and show that the controlling-the-future 
method does produce better estimates than g-estimation. However, it is 
worth noting that when the modified g-estimation in Section 2.3 is consis- 
tent, the controlling-the-future estimation is usually considerably less effi- 
cient. This is because condition (12) is less stringent than (8). The semi- 
parametric model under (8) is a submodel of the semiparametric model de- 
fined by (12). The latter will have a larger semiparametric efficiency bound 
than the former. Theoretically, the most efficient g-estimation will be more 
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efficient than the most efficient controlling-the-future estimation if the g- 
estimation is valid. In practice, even if we are not using the most efficient 
estimators, controlling-the-future estimation usually estimates more param- 
eters, for example, coefficients for hi t k(^f) in the propensity model, and thus 
is less efficient. For a formal discussion, see Tsiatis (2006). 

4. Simulation study. We set up a simple continuous-time model that 
satisfies sequential ignorability in continuous time, and simulate and record 
discrete-time data from variations of the simple model. We estimate causal 
parameters from both the modified g-estimation and the controlling-the- 
future estimation. We also present the estimates from naive g-estimation in 
Section 2.1, where we ignore the continuous-time information of the treat- 
ment processes, as a way to show the severity of the bias in the presence 
of the measurement error problem. The results support the discussions in 
Sections 2.4 and 3. 

In the simulation models below, Ml satisfies the Markovian condition in 
Theorem 5. It also serves as a proof that there exist processes satisfying the 
conditions of Theorem 5. 

4.1. The simulation models. We first consider a continuous-time Markov 
model which satisfies the CTSR assumption. 

• Y® is the potential outcome process if the patient is not receiving 
any treatment. We assume that 

Y t ° = g(V,t) + e t , 

where g(V,t) is a function of baseline covariates V and time t. Let g(V,t) 
be continuous in t and let et follow an Ornstein-Uhlenbeck process, that is, 

de t = —6e t dt + a dW t , 

where Wt is the standard Brownian motion. 

• Y% is the actual outcome process and follows the deterministic model 

(6): 

Y t = Y t ° + V [ A s ds. 
Jo 

• At is the treatment process, taking binary values. The jump of the 
At process follows the following formula: 

P(A S jumps once from (t,t + h]\A t ,Y t ,Y°) = s(A t ,Y t )h + o(h), 

P(A S jumps more than once from (t, t + h] \At , Yf, Y°) = o(h), 

where At and Y± are the full continuous-time history of treatment and out- 
come up to time t and Y° is the full continuous-time path of potential out- 
come from time to time K. By making s(-) independent of Y , we make 
our model satisfy the continuous-time sequential randomization assumption. 
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In this model, the only time-dependent confounder is the outcome process 
itself. 

We also consider several variations of the above model (denoted as Ml 
below) : 

• Model (M2) extends (Ml) to the non-Markovian-Y^° case. Specif- 
ically, we consider the case where et in the model of Y® follows a non- 
Markovian process, namely an Ornstein-Uhlenbeck process in random envi- 
ronments, which is defined as the following: 

(1) Jt is a continuous-time Markov process taking values in a finite set 
{1, . . . , m}, which is the environment process; 

(2) we have m > 1 sets of parameters 9±, a\, . . . , m , a m ; 

(3) et follows an Ornstein-Uhlenbeck process with parameters Oj,aj, 
when Jt = j; the starting point of each diffusion is chosen to be simply 
the endpoint of the previous one. 

• Model (M3) extends (Ml) to another setting of non-Markovian-Y^° 
process, where 

Y t °=g(V,t) + 0.8e t -i + 0.2e t . 

et follows the same Markovian Ornstein-Uhlenbeck process as in Ml. Every 
other variable is the same as in Ml. 

• Model (M4) considers the case with more than one covariate. In M4, 
we keep the assumptions on Y® as in (Ml) and the deterministic model of 
Yt. We add one more covariate, which is generated as follows: 

L- =0.2Yt + 0.8Y t +0 ^ + 0.5r ]t . 

r]t follows an Ornstein-Uhlenbeck process independent of the Y® process. In 
this specification, the covariate L~[ contains some leading information about 
Y°, but it is only ahead of Y° for 0.5 length of a time unit. Here, we use L~[ 
instead of Lt to denote that it is the covariate excluding Yj. The simulation 
model for the At process is given in Appendix E. 

In all of these models, to simulate data, we use g(V,t) = C (a constant), 
= 1, a time span from to 5 and a sample size of 5000. Details of other 
parameter specifications can be found in Appendix E. We generate 5000 con- 
tinuous paths of Yt and At (and L~[ in M4), from time to time 5, and record 
Yq ,Aq,Y£ ,A\, . . . , , A\, Y" 5 * and cumAJ, . . . ,cum^4g (and Lq* , . . . ,L^* in 
M4) as the observed data. 

4.2. Estimations and results under Ml. Figure 4 shows a typical continuous- 
time path of Y®, Y t and A t . The treatment switches around time 0.7 and 
time 2.8. 




We apply three estimating methods on data simulated from Ml: the naive 
discrete-time g-estimation described in Section 2.1, which ignores the un- 
derlying continuous-time processes; the modified g-estimation described in 
Section 2.3, which controls for all the observed discrete-time history; and 
the controlling-the- future method in Section 3.1 of controlling for the next 
period's potential outcome in addition to the discrete-time history. 

For estimation, even though we know the data generating process, it is 
too complicated to use the correct model for the propensity score, that is, 
the correct functional form for pk(^>) = P(A* k \L* k ,Y k * , A*^, cum A* k , Y fc °* (*)). 
Therefore, we use the following approximations (note that we control for past 
treatment and covariates as well — see comments for Theorem 5): 

(1) standard g-estimation ignoring continuous-time processes (naive g- 
estimation) 

io g it(p fe ) = A) + PxK-i + / W-i + / W; 

(2) g-estimation controlling for all observed history (modified g-estimation) 

logit(p fc ) = 0o + PiA%_ ± + foYk-x + PsY k * + & cum Afc 

(3) the controlling-the-future method, controlling for next period poten- 
tial outcomes (controlling-the-future estimation) 

logit(p fe (*)) = fa + frAUi + fo*k-i + W + & ™mA* k + 
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We plug these models for the propensity scores into estimation equations 
(5), (7) and (15) respectively. [Note that in equation (5), Yj?*($f) = Y£ - 

^Eto 1 A*> while in the other two > Y t(^) = Y k -^cumA* k .) 

The first panel of Table 1 shows a summary of the estimates of causal pa- 
rameters for 1000 simulations from Ml. The naive g-estimation gives severely 
biased estimates. Controlling for all observed history and controlling for ad- 
ditional next period potential outcome both give us unbiased estimates. As 
discussed at the end of Section 3.2, the controlling-the-future method has 
lower efficiency. 

The last row of the first panel in Table 1 shows the coverage rate of the 
95% confidence interval estimated from the 1000 independent simulations. 
Naive g-estimation has a zero coverage rate, while the other two methods 
have coverage rates around 95%. 

4.3. Simulation results under M2 and M3. The results in the second 
panel of Table 1 are typical for different values of parameters under M2. 
The naive g-estimation performs badly, while both of the other methods 
still work well with the data generated from M2. This shows that the mod- 
ified g-estimation and the controlling-the-future method have some level of 
robustness to mild violations of the Markovian assumption. 

The third part of Table 1 shows the results of simulation from M3, where 
Y° violates the Markov property more substantially. In this case, we can see 
that the mean of the modified g-estimates is biased, but the mean of the 
controlling-the-future estimates is almost unbiased. In the last row of the 
third panel, the coverage rate for the modified g-estimation drops to 0.855, 
while the controlling-the-future method still has a coverage rate of 0.956. 

4.4. Estimations and results under M4. In M4, we create a covariate 
that has leading information about Y® . In the data simulated from M4, the 
observational time sequential randomization (8) no longer holds, although 
the data are generated following continuous-time sequential randomization. 
This simulation serves as a numerical proof of the claim that continuous-time 
sequential randomization does not imply discrete-time sequential random- 
ization. 

To show this, we consider the following working propensity score model 
at time k = 2 and its dependence on the future potential outcome at m = 4: 

• not controlling for the next period potential outcome (used in mod- 
ified g-estimation) 

logit(P(At = l\At_,L-\^Al,YZ,Y°*)) 
(16) =Po + Pi cum A* k + (3 2 L-^ + (3 3 L~* + foA%-i 
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Table 1 

Estimated causal parameters from data generated by M1-M4 



Naive g-est. Mod. g-est. Ctr-future est. 



Simulation results trom 


Ml, true parameter 


1 

— 1 




Mean estimate 1 0.7728 


1.0005 




0.9988 


S.D. of estimates* 0.0183 


0.0191 




0.0403 


S.D. of the mean estimate* 0.0005 


0.0006 




0.0013 


Absolute bias" 0.2272 


0.0005 




0.0012 


Coverage* 


0.946 




0.956 


Simulation results from 


M2, true parameter 


1 

— ± 




Mean estimate* 0.7651 


1.0016 




1.0000 


S.D. of estimates* 0.0132 


0.0158 




0.0371 


S.D. of the mean estimate* 0.0004 


0.0005 




0.0012 


Absolute bias** 0.2349 


0.0016 




0.0000 


Coverage* 


0.953 




0.950 


Simulation results from 


M3, true parameter 


= 1 




Mean estimate* 0.7580 


0.9845 




1.0026 


S.D. of estimates* 0.0149 


0.0180 




0.0487 


S.D. of the mean estimate* 0.0005 


0.0006 




0.0015 


Absolute bias** 0.2420 


0.0155 




0.0026 


Coverage* 


0.855 




0.956 


Simulation results from 


M4, true parameter 


= 1 




Mean estimate* 0.7816 


1.0853 




1.0085 


S.D. of estimates* 0.0201 


0.0289 




0.0806 


S.D. of the mean estimate* 0.0006 


0.0009 




0.0025 


Absolute bias** 0.2184 


0.0853 




0.0085 


Coverage* 


0.115 




0.948 



* Averaged over estimates from 1000 independent simulations of sample size 5000. 

* Sample standard deviation of the 1000 estimates. 
*Sample S.D./y/TWO. 

** Absolute value of (1-mean estimates). 

*Coverage rate of 95% confidence intervals for 1000 simulations. 



• controlling for the next period potential outcome (used in controlling- 
the-future estimation) 

logit(P(At = l\At_,L-*,^^ k ,Y^Y k °* l ,Y^)) 

(17) =/3o + Pi cum A* k + foL~ k *_ x + p 3 L k * + ^A\_ x 

+ f3 5 Y k * + faY£_ x + /3 7 n+i + PsY% . 

We can use the true values of Y k f x and Y^* in the regression to test the 
discrete-time ignorability since we are simulating the data. Table 2 shows 
the estimates of and /?8 in both regression models. The result shows that 
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Table 2 

Verification of observational time sequential randomization 
under M4 





Reg. model (16)* 


Reg. model (17)* 






0.1868 


p- value 




5.56e-05 


Ps 


0.0936 


0.0134 


p- value 


0.0006 


0.691 



'Simulation sample size = 10,000. 



the coefficient of Y~f, (3%, is significant if we do not control for the future 
potential outcome and is not significant if we control for the future potential 
outcome. This shows that observational time sequential randomization (8) 
does not hold, while the revised assumption (12) holds. 

The estimation results from M4 appear in the fourth panel of Table 1. 
In applying these methods, we use the following propensity score models 
separately: 

(1) g-estimation ignoring the underlying continuous-time processes (naive 
g-estimation) 

iogit(p fe ) = A) + /Mk_i + /W-i + /W + fc L k*i + W; 

(2) g-estimation controlling for all observed history (modified g-estimation) 
logit(p fc ) = /3 + P\A*k-\ + fa Y k-i + fa Y k + ftcum^ + 5 £fc*i + foL^*; 

(3) the controlling-the-future method controlling for next period poten- 
tial outcomes (controlling-the-future estimation) 

logit(;p fc (*)) = A, + + W-l + fo Y k + fa cum^ 

+/35^: 1 +/3 6 v+^n+i(*)- 

Both the naive g-estimation and the modified g-estimation give us esti- 
mates with severe bias and they have coverage rates of and 0.115, respec- 
tively, for the 95% confidence interval constructed from them. It is worth 
noting that model 3 is misspecified, but, nevertheless, leads to much less 
biased estimates, and the controlling-the-future method has a coverage rate 
of 0.948. 

5. Application to the diarrhea data. In this section, we apply the differ- 
ent approaches to the diarrhea example mentioned in Section 1 (Example 
2). For illustration purposes, we ignore any informative censoring and use 
a set of 224 children with complete records between ages 3 and 6 from 757 
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households in Bangladesh around 1998. The outcomes, Y k , are the heights 
of the children in centimeters, measured at round k of the interviews, for 
k = 1,2,3. The treatment At at the interview k is defined as A* k = 1 if the 
child was sick with diarrhea during the past two weeks of the interview and 
A* k = otherwise. The cumulative treatment cum A* k is the number of days 
that the child suffered from diarrhea from four months before the first inter- 
view (July 15th, 1998) to the feth interview. Baseline covariates V include 
age in months, mother's height and whether the household was exposed to 
the flood. Time-dependent covariates other than the outcome, that is, L k * , 
include mid-upper arm circumference, weight for age z-score, type of toi- 
let (open place, fixed place, unsealed toilet, water-sealed toilet or other), 
garbage disposal method (throwing away in own fixed place, throwing away 
in own nonfixed place, disposing anywhere or other method), water puri- 
fying process (filter, filter and broil, or other) and source of cooking water 
(from pond or river/canal, or from tube well, ring well or supply water). 

We apply naive g-estimation, modified g-estimation and the controlling- 
the- future method to this data set. Since we only have three rounds, the 
actual propensity score models and the estimating equations for the three 
methods are as follows. Note that these estimating equations are for illus- 
trative purpose and may not be the most efficient estimating equations for 
this data set. 

• Naive g-estimation uses the following propensity score model: 
logit{PL4£ = 1\V, L~*,Y k *}} = f3 + (3 V V + p L L~* + p Y Y k \ 
where A; = 1,2. 

The estimating equations follow the form of (5) in Section 2.1: 



Vi 



£ [At ti -P(A% ti = l\Vi,L-£,Yj;j] 

l<fc<m<3 



j-* 



0. 



where Y^) = Y^ t -^YZi ^- 

• Modified g-estimation uses this propensity score model: 

io g it{p[^ = i|y,v,y fe *,cum^]} 

= A) + PvV + foL k * + p Y Yk + ftumACum^, 



where k = l,2. 
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= 



• Controlling-the-future estimation uses the following propensity score 



where Y^(^f) = - $cumA 3 . 

The interpretation of ^ in the last two models is that one day of suffering 
from diarrhea reduces the height of the child by if centimeters. For naive g- 
estimation, the underlying data generating model treats the exposure at the 
observational time as the constant exposure level for the next six months, 
which does not make sense in the context. It should be noted that if we 
apply the naive g-estimation, the estimated \E' should not be interpreted 
the same way in the modified g-estimation and the controlling-the-future 
method. Instead, it be interpreted as the effect of having diarrhea at the 
time of visits. The effect of the child having diarrhea at any time between 
the visit and the next visit six months later, but not at the time of the visit, 
is not described by this 

The estimating equations are solved by a Newton-Raphson algorithm. 
The estimated \I/ and its standard deviation are reported in Table 3. Mod- 
ified g-estimation estimates \& = —0.3481, which means that the height of 
the child is reduced by 0.35 cm if the child has one day of diarrhea. Our 
controlling-the-future method produces an estimate of ^ = —0.0840. Al- 
though all of the estimates are not significant because of the small sample 



model: 



logit{P[Ai = 1 1 V, V , Y i . ^m A{ , y 2 °* (*)]} 

= ft + /3 V V + /3 L V + /W + Aumicumi; + /3 y0 F 2 °*(*). 
The estimating equations follow (15) in Section 3: 




j_ , t 



cum ^ ■ 
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size, the sign and magnitude of the estimate from the controlling-the-future 
method are similar to what has been found in other research on diarrhea's 
effect on height [e.g., Moore et al. (2001)]. 

In addition, we note that the standard deviation of the modified g-estimate 
is higher than that of the controlling-the-future estimate. As discussed at 
the end of Section 3.2, if the modified g-estimation is consistent, we would 
expect the controlling-the-future estimation to have larger standard devia- 
tion. The standard deviations in Table 3 provide evidence that the modified 
g-estimation is not consistent. 

6. Conclusion. In this paper, we have studied causal inference from lon- 
gitudinal data when the underlying processes are in continuous time, but 
the covariates are only observed at discrete times. We have investigated two 
aspects of the problem. One is the validity of the discrete-time g-estimation. 
Specifically, we investigated a modified g-estimation that is in the spirit of 
standard discrete-time g-estimation, but is modified to incorporate the in- 
formation of the underlying continuous-time treatment process, which we 
have referred to as "modified g-estimation" throughout the paper. We have 
shown that an important condition that justifies this modified g-estimation 
is the finite-time sequential randomization assumption at any subset of time 
points, which is strictly stronger than the continuous-time sequential ran- 
domization. We have also shown that a Markovian assumption and the 
continuous-time sequential randomization would imply the FTSR assump- 
tion. The Markovian condition is more useful than the FTSR assumption, in 
the sense that it can potentially help researchers decide whether the appli- 
cation of the modified g-estimation is appropriate. The other aspect is the 
controlling-the-future method that we propose to use when the condition to 
warrant g-estimation does not hold. The controlling-the-future method can 
produce consistent estimates when g-estimation is inconsistent and is less 
biased in other scenarios. In particular, we identified two important cases 
in which controlling the future is less biased, namely, when there is delayed 
dependence in the baseline potential outcome process and when there are 
leading indicators of the potential outcome process in the covariate process. 

In our simulation study, we have shown the performance of the modified 
g-estimation and the controlling-the-future estimation. The results confirm 



Table 3 

Estimation of ^ from the diarrhea data set 



Method 


Estimate 


Std. err. 


Naive g-est. 


-0.3991 


0.2469 


Modified g-est. 


-0.3481 


0.2832 


Controlling-the-future est. 


-0.0840 


0.1894 
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our discussion in earlier sections. The simulation results also indicate the 
danger of applying naive g-estimation, which is usually severely biased and 
inconsistent when its underlying assumptions are violated, as in the situa- 
tions considered. 

We have applied the g-estimation methods and the controlling-the-future 
method to estimating the effect of diarrhea on a child's height and estimated 
that its effect is negative but not significant. The real application also pro- 
vides some evidence that the modified g-estimation is not consistent. 

All of the discussion in this paper is based on a particular form of causal 
model — equation (6). However, all of the arguments could apply to a class 
of more general rank-preserving models, with necessary adjustments in var- 
ious equations. If we assume a generic rank-preserving model with Yt = 
f (Y® , h(At-);^) , where At— is the continuous-time path of A from time 
to t— , h is some functional [e.g., in our paper, h(At-) = f Q A s ds] and / is 
some strictly monotonic function with respect to the first argument [e.g., in 
our paper, f(x,y;V) = x + we map Y fc * to Y" fe °* = f~ l {Y£, h(A k _); V?), 
where / _1 is the inverse of f(x,y;^) with respect to x for any given y. 
We can then substitute all cum^'s in this paper by the fr(Afc_)'s. All of 
the discussion and formulas in the paper would remain valid under the as- 
sumption that we observe all /i(j4^_)'s, which can be easily satisfied with 
detailed continuous-time records of the treatment. It should be noted that 
the argument does not work if a time-varying covariate modifies the effect 
of treatment. For example, if Yt = Y® + VP L 2 s A s ds, where L s is a time- 
varying covariate, observing the full continuous-time treatment process is 
not enough. Some imputation for the L s process is necessary. 

The methods considered here have several limitations. These include rank 
preservation, a strong assumption that the effects of treatment are determin- 
istic. This assumption facilitates the interpretation of models. In other work 
on structural nested distribution and related models [e.g., Robins (2008)], 
rank preservation has been shown to be unnecessary in settings in which one 
is not modeling the joint distribution of potential outcomes under different 
treatments. We expect that this is also the case here, and work justifying this 
more formally is in progress. We also require that the cumulative amount 
of treatment (or the full continuous-time treatment process, if using other 
causal models mentioned above) between the discrete time points when the 
covariates are observed is known. Work is in progress on the more challeng- 
ing case in which the treatment process is only observed at discrete times 
and the cumulative amount of treatment is measured with error. In addition, 
we ignore any censoring problem requiring that our data is complete, which 
might not be satisfied in reality. It will also be interesting to study how to 
accommodate censored data in our framework in future work. 
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APPENDIX A: ESTIMATING COVARIANCE MATRIX OF 

ESTIMATED PARAMETERS 

The formulas in this appendix can be used to estimate the covariance 
matrix of the estimated parameters from naive g-estimation of Section 2.1, 
modified g-estimation of Section 2.3 and the controlling-the-future estima- 
tion of Section 3.1. More general results on the asymptotical covariances can 
be found in van der Vaart (2000). 

We write 9 = In Sections 2.1 and 2.3, /3 is the parameter in the 

propensity score model. In Section 3.1, (3 = (PxiPh) is the- parameter in the 
propensity score model. Let U{9) be the vector on the left-hand side of the 
estimating equations [equation (5) in Section 2.1, equation (7) in Section 2.3 
and equation (15) in Section 3.1, respectively]. We also define 

for the naive g-estimation and the modified g-estimation, and 

U iAm (6) = {A\ h - Pi ^^ x ,p h ))[g{Y^),X*M),Xl k ^ k \ T 

for the controlling-the-future estimation. We then h ave U{6) = X^t,fc,m- 
Let B{6) = E[ dU Qp \, which can be estimated as 




where 6 is the solution from the corresponding estimating equations, k <m 
in both g-estimations and k < m — 1 in controlling-the-future estimation. 
The covariance matrix of the estimator 9 can then be estimated as 

Cov(#) = B~ l {6) Cwp{d)]B-\6)' 

by the delta method, where Cov[U(6)} is estimated by 

Cov[U0)]=J2Ui0)Ui0y 

i 

with Ui = Y2kmUi,k,m(0)i k < m in both g-estimations and k < m — 1 in 
controlling-the-future estimation. 

APPENDIX B: EXISTENCE OF SOLUTION AND IDENTIFICATION 

The estimating equations in this paper, equation (5) in Section 2.1, equa- 
tion (7) in Section 2.3 and equation (15) in Section 3.1, are asymptotically 
consistent systems of equations by definition, if the respective underlying 
assumptions for each estimating equation hold true. The existence of a so- 
lution is guaranteed asymptotically. In addition, we have the same number 
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of equations as the number of parameters in each system. One would usu- 
ally expect there to exist a solution for the estimating equations, even in a 
relatively small sample. 

However, the asymptotic solution may not be unique, which leads to 
an identification problem. As a special case from the more general semi- 
parametric theory [see Tsiatis (2006)], we state the following lemma for 
identification, following the notation of Appendix A. 

Lemma 6. The parameter 6 is identifiable under the model 



if both Cov[{7(#o)] an d B(9q) = E[ — ^s^-] are of full rank. Here, 9q is the 
value of the true parameter. 

Proof. The proof is trivial. By Appendix A, the asymptotic covariance 
matrix of the estimates is given by 



which will be finite and of full rank when the conditions in the lemma hold 
true. □ 

APPENDIX C: PROOF THAT FTSR IMPLIES CTSR 

We assume that Z t is a cadlag process, and everything we discuss is in an 
a.s. sense. 

We first define 



Recall that Nt counts the number of jumps in At up to time t. We assume 
that a continuous version of the Tt— t+ intensity process of Nt exists, which 
we denote by r\t . If we define 



Then, under certain regularity conditions [see Chapter 2 of Andersen et 
al. (1992)], for every t, 



E[U(6)]=0 



B- 1 (eo)Cov[u(e )]B~ 1 (e y, 



r t (5) = (1 - A t .)A t+5 +A t .(l- A t+S ). 




E[r t (6)\Ft-,t + ] 



a.s. 



6 
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For Theorem 4, we need to show that rjt is also Tit- -measurable. This is 
because if this is true, then 







rto+s 




E 


Nt + 


, - / r} t dt 
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Vtdt 



N, 



to 









Tit 







+ E 



E{ E 
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to 



Vtdt 
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to 



+ N t0 - 
+ N t0 

Nto- 



Vtdt 



o 



to 



Vtdt 



to 



Vtdt. 



o 



The second equality follows because of properties of conditional expecta- 
tion and the assumption that vt is ^--measurable. The third equality holds 
because vt is an Tt-,t+ intensity process of N t . The last equality shows that 
Vt is also a Ht- intensity process of Nt, which agrees with the definition of 
CTSR. 

Before proving the main result, we assume the following regularity con- 
ditions. 

1. As stated before, we assume that vt is continuous. We further assume 
that j]t is positive, and bounded from below and above by constants that do 
not depend on t. We also assume that E ^ r t ^^ t - ^ is bounded by a constant 
for every t within a interval of (0,5o]- 

2. We assume that for any finite sequence of time points, t\ < t2 < 
*3 < ' ' ' < t n , the density f(Z tl = z±,Z t2 = Z2, ■ ■ ■ , Z tn = z n ) is well defined 
and locally uniformly bounded, that is, there exists a constant D and a 
rectangle B = [t\ — 8i,t± + 5\] x [t 2 - S 2 , t 2 + S 2 ] x • • • x [t n - 5 n , t n + S n ] such 
that for any (ti,t' 2 , . . . ,t' n ) T € B and any possible value of (z\,z 2 , . . . ,z n ) T , 

f{ z t[ =zi,Z t > 2 =z 2 ,..., Z t ' n = Zn) < D. 

For any conditional expectation involving finite sequence of time points, we 
choose the version that is defined by the joint density. 
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3. Given any finite sequence of time points, t\ < t 2 < t 3 < ■ ■ ■ < t n and 
any possible value of (z±, z 2 , . . . , z n ) T , we assume that the following conver- 
gence is uniform in a closed neighborhood of t = (£1^2,^3, • • • >^n) : 

f{ z t[ = zi,Z t > 2 = z 2 ,..., Z Vn = z n ) 

P(Z t , e \z\,z\ + Ai], Z t , e [z 2 ,z 2 + A 2 ], . . . , Z v e [z n ,z n + A n ]) 

= 1™ IT a * > 

A4.0 Ai x A 2 x • • • x A n 

where (t^t^, . ■ ■ , t' n ) T is in a neighborhood of t. 

4. Given any finite sequence of time points, t\ < t 2 < t 3 < • • • < U < 
■■■ <t n and any possible value of (z\ , z 2 , . . . , z n ) T , we define 

= p i A u+&^ A u \Z tl =z 1 ,Z t2 =z 2 ,...,Z tn = z n ) 

5 

We assume that lim^ /(^) exists and is positive and finite. We also assume 
that f(5) is finite and is right-continuous in 5, and the continuity is uniform 
with respect to (5,ti) in [0,5q] x B(ti), where B(ti) is a closed neighborhood 
of ij. Further, we assume that the above assumption is true if any of the Z 
in / is in its left-limit value rather than the concurrent value. 

Remark 7. The third regularity condition is needed when we want to 
prove convergence in density. For example, consider that when 5 4 0, we have 
Z t2+ s — > Z t2 . We can then see that 

lim f(Z tl = zi,Z t2+5 = z 2 ,Z tz = z 3 ) 

0-10 

= Um Um P{Zh € [z 1 ,z 1 +Ai],Z t2+ s£[z2,Z2 + A 2 ],Z t3 e [z 3 ,z 3 + A 3 ]) 

5;o A14.0 AiA 2 A 3 
A 2 |0 

_ Hm Um PjZh € [z u z 1 + A 1 ] 1 Z t2+8 £[z 2 ,z 2 + A 2 },Z t3 e [z 3 ,z 3 + A 3 ]) 

AUO 5;o AiA 2 A 3 

A 2 4-0 

A34.0 

_ Hm PjZti 6 [zi,zi + Ai],Z f2 g [z 2 ,-Z2 + A 2 ],Zt 3 g [^3^3 + A 3 ]) 
A14.0 AiA 2 A 3 

A 2 4-0 

A34.0 

= /(^ti = zi,Z t2 = z 2 , Z t3 = z 3 ). 

The interchanging of limits in the second equality is valid because of the 
third regularity condition. The third equality follows from the fact that 
probabilities are expectations of indicator functions and that the dominated 
convergence theorem applies. 
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We introduce the following lemma for technical convenience. 

Lemma 8. If the cadlag process Zt follows the finite-time sequential ran- 
domization as defined in Definition 3, then the following version of FTSR 
is also true: 

p ( A t n \Lt n _ 1 ,L tn -,A tn _ 1 ,Y t n _ i ,Y t n ^,Y ( l n+ ) 

- -Q Q 

= P {At„ l-^tn-i , ^tn-i At n _! , Y tn _ 1 , Y tn _), 

where L tn _ x = (L tl ,L t2 ,. . . , L tn ^), A tn _ x = (A tl ,A t2 ,.. .,A tn _ x ), Y t ° nl = (Yg, 
Y t 2 i ■ • ■ > Y t n -i) an d Y-t„+ = ( Y tn+1 1 Y t„ +2 i ■ • ■ ) Y t n+t ) • 

Remark 9. The difference between (18) and the original definition of 
FTSR is that in (18), most L's and Y 0, s are stated in their concurrent values, 
while in Definition 3, they are all stated in their left limits. Lemma 8 is only 
for technical convenience. 

Proof of Lemma 8. The result follows directly from the definition of 
a cadlag process. □ 

We now consider a discrete-time property. 

Lemma 10. Suppose FTSR holds true. If we define 

JF = a(Z tl Z tn _ 1 , Z t - , Y t ° n+1 Yf n+l ), 
% = <j{Z tl ,. . . , Zt n _ i: Zt-), 
then we have for every t that 

(19) ]imM=] . m Mi as 

sio 5 sio 5 

Proof. First, we note that the limits on both sides of equation (19) 
exist and are finite. This fact follows from the regularity condition 1. Take 
lim^o ^ / , for example: 

]im E[r t (5)\T] = Um E[E[r t (8)\a(Z t -,Y2)) W 



54.0 5 sio 5 



E 



lim 



E[rt(S)\a(Z t -,m 



54,0 5 

The interchange of limit and expectation is guaranteed by the assumption in 
regularity condition 1 that E ^ t ^ a ^ Zt ^'— t ^ [ s bounded. The existence is then 
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guaranteed by the dominated convergence theorem and -E^IJ 7 ] is obviously 
finite. 

Given equation (10) and Lemma 8, we always have 
(20) E[I At ^ A jL t _,A tn ,Yl,Yl] = E[I At ^ At jL t .,A tn ,Y t °_] a.s., 

where L t _ = (L tl ,L t2 ,..., L tn _, , L tn , L t „ ) T , A tn = {A tl ,A t2 ,...,A tn ) T , Y t °_ = 
(^t?;^*2' • • ■ ' **n> Y t-) an dY t+ = (Y t ° n+1 ,Y t ° n+2 ,...,Y t ° n+i ) . 

In the regularity conditions, since we assumed the existence of joint den- 
sity, the usual definition of conditional probability is a version of the condi- 
tional expectation defined using cr-fields. In our case, we have 

810 

_ ^ P(A t+s ^A t -\Z tl ,..., Z tn _, , Z t _ , Y t ° n+1 Y t ° n+i ) 

810 5 

.. P{^t+8^AtjZt 1 ,...,Z im _ 1) Zt n ,Lt-,Y t _,Y t ,...,YP) 
= lim iim — ; r 

<5;o t n -\t- 5 + (t-t n ) 



lim lim 



P(A t+s / A tn \Z tl Z tn _^ , Z tn , L t - , Y t _ , Y t ° n+1 Yf n+i ) 

Cii-^lo s + (t-t n ) 

p (A t ^ A tn \Z tl Z tn _ x , Z tn , , Y t _,Y t ° Y t ° ) 
= hm :! — 

= Hm E[I At ± Atn \L t -.,A tn ,Yl,Y* + \ 

The second equality is guaranteed by the third regularity condition. By Re- 
mark 7, we can show that the conditional density in the third line converges 
to the second line as A tn and Z tn converges to A t - and Z t -. The (t — t n ) 
term in the denominator is not needed for the second equality, but is crucial 
for the interchangeability of limits in the third equality. The interchangeabil- 
ity of limits is guaranteed by the fourth regularity condition. By the fourth 
regularity condition, the following limit 

lim P ( At+s t. ^ tn I Zt 1 ' : " " ' Ztn ~ 1 ' Ztn ' Zt ~ ' Ztn+1 ' " " " ' Ztn + l \ 
810 S+(t- t n ) 

_ P{At 7^ A tn \Z tl ,..., Z tn _ 1 , Z tn , Z t - , Z tn+1 Z tn+l ) 

t t n 

is uniform in t n . 

If we integrate out some extra variables, we can get that 

. . P(A t+ s ^ A tn | Z tl , • • • , Z tn _ x , Z tn , L t - , Y t °_ , Y t ° Y t ° ) 

hm — 

810 5+(t- t n ) 
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_ P{M / A tn \Z tl ,. . . ,Z tn _ x , Z tn ,L t -,Y t _,Y tn+1 , . . . ,Y t ° n+l ) 

t t n 

is uniform in t n . 

Therefore, we can interchange the limits in the third equality. 
Similarly, we can prove that 

]im E h(W} _ Hm E[I At ^ At jL t ^A tn ,Yl] ^ 

610 5 tnfa- t — t n 

Therefore, we have 

l]m EMM = lim E[I A ^ A jL t ^A tn ,Yl,Y° + ] 



610 5 tnXt- t — t n 

lim E[I A ^ A jL^,A tn ,Yl 



tri\t— t tf 

610 5 

The second equality comes from (20). □ 



We now prove the final key lemma. 



Lemma 11. Given FTSR, rjt is %t- -measurable. 



Proof. We prove the result by using the definition of a measurable 
function with respect to a cr-field. 

For any a £lZ, consider the following set: 

B = {uj-.r/t < a}. 

Since rjt is measurable with respect to J r t-,t+, B € JFt-,t+- 

By Lemma 25.9 of Rogers and Williams (1994), B is a cr-cylinder and 
it can be decided by variables from countably many time points. Suppose 
the collection of these countably many time points is S. S = Si U S2, where 
tij < t for t\ ; i G S\ and tij > t for t^j € £2 • 

Let Ts denote the cr-field generated by (Z tl . , i € N; Z t ~ ;Y t ° 2 . , j £ M) . We 
have augmented the cr-field generated by variables from S with Zt - ■ 

Next, define the following series of cr-fields: 

T x = o{Z tx ^Z t ^Y^), 
Too = Fs- 
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Considering the following sets: 

B 1 = {w:^|Ji]<a}, 
B 2 = {u:E[r]t\T 2 ]<a}, 

Bs^B^ = {tj:E[r, t \Ts]<a}. 

We have B k € T k . 
It is easy to see that 

because 

E[E[r h \T k ]\T k - X ]=E[r lt \T k _ 1 } 

and taking conditional expectation preserves the direction of inequality. 

Also, with the above definitions, JF k f J 7 ^. Therefore, by Theorem 5.7 from 
Durrett [(2005), Chapter 4], we know that 

E[r] t \T k ] ^ E[?] t \T s } a.s. 

It is then easy to see that 1b x — > Ib s a.s. and that 



B S = f] B i 



1=1 

with difference up to a null set. 
We now claim that 

(21) B s = B 

with difference up to a null set. 

Obviously, B C Bs- Suppose that P(B$ — B) > 0. Since Bs — B £ Ts, we 
have 



[ r] t P(du)= [ E[ m \F s )P(dio 
Jb s -b Jb s -b 



Then 

LHS > aP(B s - B) 

and 

RHS <aP(B s - B). 

This is a contradiction. 

Therefore, B = fl^i A with difference up to a null set. 
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Next, we define 



Hi = a(Zt 1A ,Z t -) 
H 2 = a(Ui,Z tl2 ), 



Given FTSR, by Lemma 10, we have 



E[r ]t \T k ] = E[r ]t \'H k }. 



Therefore, every B k E %k and thus B& € Tit- ■ 

Since B = H^i-^! B £ Tit- as well. By the definition of a measurable 
function, ijt is measurable with respect to Tit- ■ □ 

Combining all of the results in this appendix, we have proven Theorem 4. 

APPENDIX D: PROOF OF THEOREM 5 

Let Q t = a(Y t °_,L t -,A t -). Recall the definition of r t {$) = (1 - A t -)A t+ s + 
At-(1 — At+s) and that Z t = (Y®,L t ,A t ) T . By the Markovian property and 
the cadlag property, it is easy to show that 



Note that, without loss of generality, we only consider Y^ +s in the proof, 
rather than 3QL. 

Therefore, we have a reduced form of continuous-time sequential random- 
ization: 



E[r t (5)\*(Z t -)]=E[rt(5)\g t ] 



and that 



E[r t (8)\a(Z t _,Y t ° +s )] = E[r t (5)\v(GtX 



)]■ 



lim 



E[r t (S)\a(g t ,Y t ° +s )] 



= lim 

54.0 



E[r t (5)\a(Z t _X 



6 



= lim 

54.0 



5 

E[r t (S)\a(Z t _)] 



S 




E[n(5)\g t ] 



(22) 
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then we can conclude (11). The reason is as follows: assuming (22) to be 
true, we integrate At- out on both sides of the equation. We will get 

f(Y t ° +s \Y t °_ , L t -)P(A t \Y t °_ ,L t -) = f(Y t ° +s , A t \Y t °_ ,L t -). 

Dividing the above equation by f(Yt+ s \Yt-->Pt-)-, we obtain (11). 
Consider 

g{5i,5 2 ) = f{Y t ° +s \A t+5l =ai,A t -s 2 = a 2 ,Y t °_,L t -), 

where 5± > and 5 2 > 0. 
We observe that 

lim lim q(6\, 5 2 ) 
<5i40 5 2 i0 

= limf(Y t +s \A t+Sl =a 1 ,A t _ = a 2 ,Y t _,L t _) 



lim 

5i40 



f(Y t ° +s , A t+5l = ai \A t - = a 2 ,Y t °_ , L t - ) 



P(A t+Sl \A t - = a 2 ,Y t °_,L t -) 
f(Y t ° +s \At-=a 2 ,Y t ( i,Lt-) 

x Um P(A t+ s 1 =a l \Y t +s ,At-=a 2 ,Yl,L t -) 
<5i40 P(A t+Sl =a 1 \A t - =a 2 ,Y t °_,L t -) 

{ f(Y t ° +s \At-=a 2 ,Yl,L t -) 

1 - P(A t+Sl + At- |y t ° +s , At- = a 2 , Y t °_ , L t . 



x lim 

<5i40 

if a\ = 
f(Y t ° +s \At- 

x lim 



1 - P(A t+5l 


^A t -\U 


-=a 2 ,Yl,L t -) 


a 2 , 






= a 2 ,Y t °_,L t -) 






P(A t+Sl ^ A t - 


\Y t ° +s ,A t _ 


= a 2 ,Y t _,Lt-)/5 1 



sao P(A t+ s 1 ^At-\A t -=a 2 ,Y t ( L,Lt-)/S 1 ' 
if ai / a 2 . 

= f(Y t +s \At-=a 2 ,Y t ( L,Lt-). 

Here, the validity of taking the limit inside the density is guaranteed by 
the third regularity condition, and the last equality follows because of the 
continuous-time sequential randomization assumption. 
We also observe that 

lin din. v atf! , 5 2 ) = lirr i f(Y t ° +s \A t s 2 , A t , Y?_ , L t - ) 

024-U 0\\X) 024-U 

= \\mj(Y^A t ,YlM-) = f{Y^ s \A u Yl,L t -). 
0240 



The second equality uses the Markov property. 
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If we can interchange the limits, then we have 

f(Y t ° +s \At-,Y t °_ ,L t -) = f(Y t ° +s \A t ,Y t °_ , L t _). 

Equation (22) follows from the definition of conditional density. 
We now establish the fact that 

lim lim 0(61,62) = lim lim 0(61,62) 
<5 2 4-0 5i40 <5i40<5 2 4-0 

by showing that lim^off^i,^) is uniform in 6 2 - 
If we define gi(6 2 ) =lim ai ^. g(6i,6 2 ), then 



\g(6i,6 2 ) -gi(62)\ 



f(Y t ° +s ,A t+Sl = ai\A t _s 2 = a 2 ,Y t °_,L t 



P{A t+5l =ai\A t „ S2 =a 2 ,y t °_,L t _) 
f(Y t ° +s ,A t = ai\A t _ 52 = a 2 ,Y t °_,L t - 



P(A t = ai\A t - S2 =a 2 ,Y t _,Lt-) 
f(Y t ° +s \At-8 2 =a 2 ,Y t ( l,L^) 

P(A t+Sl =a 1 \A t „5 2 =a 2 ,Y t _,L t -,Y t +s ) 



P(A t+Sl = ai \A t „s 2 =a 2 ,Y t _,L t -) 
P(A t = ai\A t _s 2 = a 2 ,Y t °_ , L t _ , Y t \ s 



P(A t = a 1 \A t - S2 =a 2 ,Y t _,L t -) 
Consider the ratio ^+^1^=^^-^) 



verges to 



P(A t+6l = ai \A t _s 2 =a 2 ,Y t _,L t -) 
P(A t = ai \A t _s 2 =a2,Y t _,L t -,Y t +a ) 



We claim that it con- 



P(A t =a 1 \A t 



=a 2 ,Y t °_,L t -) 



uniformly in 62 ■ 



If ai = C12, then the density P(A t+ § 1 = ai\A t _g 2 = a2,Y^_,L t -) is bounded 
from below by a positive number. By the fourth regularity condition, 



P(A t+5l = ai\A t _ 52 = a 2 ,Y t °_ , L t _ , Y t ° 



t+s) 



P(A t = ai \At-h = 02, Y t °_ , Lt- , Y t ° +S 



and 



P(A t+Sl = ai \A t _s 2 =a 2 ,Y t _,L t _)^P(A t = a 1 \A t _ S2 = a 2 ,Y t °_,L t -), 

uniformly in 6 2 , as 61 J, 0. When the denominators are bounded from below 
by a positive number, the ratio also converges uniformly. 
If ai ^ a 2 , then, by the fourth regularity condition, we have 

P(A t+Sl =ai\A t _ S2 =a 2 ,Y t ( l,L t ^,Y t +s ) 
61 + 62 

P(A t = ax\A t „5 2 = a 2 , Y t °_,L t _,Y t ° +s ) 
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and 

P(A t+Sl =a 1 \At- S2 =a 2 ,yf_,Lt-) 
<$i + 8 2 

P(At=a 1 \A t _s 2 =a2,Y?_,L t -) 

~* s 2 

uniformly in 82 , as 8± I 0. Also, the denominator E^l±h ai ^'~^ a 2 ,Y t _,L t -) 
is bounded from below by a positive number. Hence, we establish the uniform 
convergence of the ratio. 

Combining the two cases above, \g(Si,52) — <7i (^2 ) | is bounded by 0(8\), 
which does not depend on 82, so g(S\, 82) — > 51(^2) uniformly in 82 - Therefore, 

lim lim q{8\ , 82) = lim lim q(8-i , 5o). 

By the argument at the beginning of the proof, we have proven the first part 
of the theorem. 

To show that (11) implies FTSR, without of loss of generality, we consider 
P(At\L t - , Y t _ , A t -m, Z/( t _ m )_ , Y"^_ m )_ , Y t+3 ) 

f(A t , L t - , Y t °_ , At-m, L( t _ m y , F(°_ m )_ , Y t ° +S ) 
J2i=o,i f( A t = i, L t - , Yfl, At- m , L(t- m )- , Y( t _ m y,Y t +s ) 

= (/(y t ° +s |^,L t _,y t _)/( J 4 t ,L t _,y t _, J 4 t _ m ,L (t _ m) _,yo„ m) _)) 

/(j2f(Yt° +s \At = i,L t -,Y t °_) 
\=o,i 

xf(A t = i, L t _ , Yl , A t . m ,L {t _ m) _ , Y*_ m) _ )\ 

f(Y t+s \L t - , Y t _)f(A t , L t _ , Y t °_ , A t - m , £(t_ m )_ , Y^ t _ m y ) 

= Z)i=o,i 1^*- » >" t ° )/(-4t = i, , , A t _ m , i^^). , ) 

f(A t , Lj_ , Y t _ , A t - m , If(t_ m )_ , y"(°_, m )_ ) 
Z)i=0,l /(^* = L t~ ' *t- ' At-m,L(t- m )- , Yy._ m y) 

= P(A t \L t - , Y£_ , A t _ m , L( t _ m )_ , y"(°_ m )_ ). 

The second equality follows because of the Markov property. The third equal- 
ity uses equation (11). We have thus proven the second half of the theorem. 

APPENDIX E: SIMULATION PARAMETERS 

In all simulation models from Ml to M4, we specify the parameters as 
follows: 
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• let g(V,t) = C, a constant; let C = 100; 

• for Ml (also for M3 and M4), let 6 = 0.2 and <x = 1; 

• for M2, let m = 2, Q x = 0.2, a\ = 1 and 6 2 = 1, a 2 = 0.5. The transi- 



tion probability of Jt would be P(t) = e , where A = I 1 _ 1 J ; 

• for initial value, cq is generated from N(0, -j=); 

• the causal parameter ^ = 1 ; 

• in Ml, M2 and M3, s(A t ,Y t ) = e »o+aiA t + a 2Y t +a 3 A t Y t . let aj = _ 0-3j 
a 2 = -0.005, a 3 = 0.007 and a = -0.2; 

• in M4, At is generated as follows: if Yt-0.5 > 101 and Y t > 101, s(A t = 
l,L* t ) = 2.8; if Y t0 . 5 < 99 and Y t < 99, s(A t = 0,L* t ) = 2.8; otherwise, A t is 
generated following a model similar to that in Ml, except that s{At,Ll) = 

e a +aiAt+a2Lt+a :i AtL* . va l ueg Q f a ' s are ^g same as before; 

• in M4, rjt follows an Ornstein-Uhlenbeck process with parameters 
9 = 0.2 and cr = l; 

• for initial value, Aq is generated from Bernoulli (expit(oo + 02^0)); 

• K = 5 is the number of periods; 

• number of subjects n = 5000. 
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